Code
from langchain.embeddings.openai import OpenAIEmbeddings
embedding = OpenAIEmbeddings()
import numpy as np
import pandas as pd
from pyobsplot import Plot, d3, Math, jsProgramming a GPT model.
14 Sep, 2023
Text embeddings are numerical representations of words, sentences or documents. They are used in many NLP tasks, such as sentiment analysis, machine translation, and question answering.
are a way of representing words/sentences as vectors of numbers. The idea is that words/sentences with similar meanings will have similar vectors. This is useful for many tasks in natural language processing, such as sentiment analysis, machine translation, and question answering.
You can read more about text embeddings in this 👉 post.
THe following is an example of sentence embeddings, showing the distance between sentences. The distance is calculated using the dot product (cosine similarity) of the embeddings.
sentences = [
"Good morning, how are you?",
"I am doing well, how about you?",
"Hi, how are you doing today?",
"I'm feeling a bit under the weather today.",
"I like apples.",
"One of my daughters doesn't like kiwis.",
"The other doesn't like bananas.",
"The earth is the third planet from the sun.",
"The moon is a natural satellite of the earth.",
"Jupiter is the fifth planet from the Sun and the largest in the Solar System.",
"The humpback whale is renowned for its enchanting songs, which are believed to serve various purposes, including communication, mating, and navigation during migration.",
"Dolphins, highly intelligent marine mammals, communicate with each other using a complex system of clicks, whistles, and body language, enabling them to work together in hunting and navigation.",
"The honeybee, through its pollination efforts, plays a vital role in agriculture, contributing to the growth of many of the fruits and vegetables humans rely on for sustenance.",
"The Large Plane Trees, also known as Road Menders at Saint-Rémy, is an oil-on-canvas painting by Vincent van Gogh.",
"Pablo Picasso's Guernica, an iconic mural-sized oil painting, stands as a poignant representation of the horrors of war.",
"This powerful artwork was created in response to the bombing of the town of Guernica during the Spanish Civil War."
]embeddings = np.array([embedding.embed_query(sentence) for sentence in sentences])
dot_product_matrix = np.dot(embeddings, embeddings.T)
df = pd.DataFrame(dot_product_matrix, columns=range(1, len(embeddings)+1))
df['embedding_index'] = range(1, len(embeddings)+1)
df = df.melt(id_vars=['embedding_index'], var_name='embedding_index_2', value_name='similarity')| Sentences | |
|---|---|
| 1 | Good morning, how are you? |
| 2 | I am doing well, how about you? |
| 3 | Hi, how are you doing today? |
| 4 | I'm feeling a bit under the weather today. |
| 5 | I like apples. |
| 6 | One of my daughters doesn't like kiwis. |
| 7 | The other doesn't like bananas. |
| 8 | The earth is the third planet from the sun. |
| 9 | The moon is a natural satellite of the earth. |
| 10 | Jupiter is the fifth planet from the Sun and the largest in the Solar System. |
| 11 | The humpback whale is renowned for its enchanting songs, which are believed to serve various purposes, including communication, mating, and navigation during migration. |
| 12 | Dolphins, highly intelligent marine mammals, communicate with each other using a complex system of clicks, whistles, and body language, enabling them to work together in hunting and navigation. |
| 13 | The honeybee, through its pollination efforts, plays a vital role in agriculture, contributing to the growth of many of the fruits and vegetables humans rely on for sustenance. |
| 14 | The Large Plane Trees, also known as Road Menders at Saint-Rémy, is an oil-on-canvas painting by Vincent van Gogh. |
| 15 | Pablo Picasso's Guernica, an iconic mural-sized oil painting, stands as a poignant representation of the horrors of war. |
| 16 | This powerful artwork was created in response to the bombing of the town of Guernica during the Spanish Civil War. |
Plot.plot(
{
"height": 640,
"padding": 0.05,
"grid": True,
"x": {"axis": "top", "label": "Embedding Index"},
"y": {"label": "Embedding Index"},
"color": {"type": "linear", "scheme": "PiYG"},
"marks": [
Plot.cell(
df,
{"x": "embedding_index", "y": "embedding_index_2", "fill": "similarity", "tip": True},
),
Plot.text(
df,
{
"x": "embedding_index",
"y": "embedding_index_2",
"text": js("d => d.similarity.toFixed(2)"),
"title": "title"
},
),
],
}
)You can see that the sentences are grouped together by similarity. For example, the sentences about fruit are grouped together, and the sentences about planets are grouped together, in the sense that they are similar to each other.
@online{ellis2023,
author = {Ellis, Andrew},
title = {Text Representation},
pages = {undefined},
date = {2023-09-14},
url = {https://virtuelleakademie.github.io/promptly-literate/pages/text-representation.html},
langid = {en}
}